Thursday, March 3, 2016

Going for Hybris interview?

Are you ready.....?....you can judge your hybris knowledge yourself by providing your approach for below use cases.

Tell me how you will achieve below in Hybris or it is right time to explore wiki.hybris.com :)


  • I want to send localized email on registration completion. 
  • I want to define my custom business process flow for order completion.
  • I want to reskin/rebrand out of the box accelerator.
  • I want to implement differential pricing - same product but different prices for different customer.
  • I want to accept payment through multiple cards - Multi tendering.
  • I want to migrate Order data and it's dependencies from one version of hybris to hybris.
  • I want to define a Fraud workflow in which customer service will review then either approve or reject the order.
  • I want to enable pickup in my physical store.
  • I have stock available at multiple position and I want to aggregate stock to surface online.
  • I want to split order into multiple consignments depending on the nearest fulfillment location from delivery address.
  • I want to take order even after I don't have stock right now.
  • I want to integrate with Multiple payment gateways like credit card, paypal, EFTPOS etc.
  • I want to design my product catalog one per country with a master catalog.
  • I want to segment my customers depending on their purchase history.
  • I want to add new search facets and price ranges for making it ease for customer in decision making.
  • I want my Mega menu in header with depth of 4.
  • I want my Contents searchable.
  • I want to add new custom restrictions on promotion.
  • I want my new types not to be Cached in Hybris.
  • I want my site to be Responsive.
  • I want to write test cases before development of actual service - ATDD.
  • I want to create a catalog version aware type.
  • I want to ensure my site is free from SQL injection and cross site scripting.
  • I want certain columns of a type should kept encrypted.
  • I want my hybris application is PCI certified.
  • I want to write a Flexi search that involves many to many relation types.
  • I want to define a flexi search in Impex for exporting data.
  • I want to identify root cause of a Memory leak situation.
  • I want to ensure JVM Memory areas and GC algo is set according to my application expected data volumes and transactions.
  • I want to do a deployment without any outage.
  • I want to share my wish list pretty url.
  • I want my media objects to be stored in cloud.
  • I want to tell my team when to use platformwebservices or OCC or datahub services.
  • I want to decide I should create an Addon or an extension based on a template.
  • I want to define my custom hybris flavour using hybris latest Recipe based installation.
  • I want to decide where i should configure my data, initial/project.
  • I want my stores to be located and specify approximate distance from customer location. 
  • I want to enable Social login instead of asking customer for registration for placing an order.
  • I want to ensure my db connection pooling is configured in an optimized way.
  • I want to enable OAUTH based access of my services.
  • I want to enable Single sign on in my enterprise world where hybris application is a part of.
  • I want to add a new button in customer service cockpit.
  • I want my custom types to be reflected into hmc.
  • I want to understand how hybris has implemented ORM concept.
  • I don't want to keep passwords in properties files in plain text format.


i will keep adding here more......

  • I want to create an invoice in pdf format for my order.
  • I want to keep removing history records on regular basis before they become a bottleneck.
  • I want to create digital service product which does not have any stock and may have associated conditional products which may influence the price.






Tuesday, July 29, 2014

Endeca data modeling for best performance

ECommerce is all about selling products online. For capitalizing a sell opportunity and leaving an impression on customer, you have to present your products as quickly as possible in a nice clean way.


First thought that comes into mind to achieve ‘as quickly as possible’ requirement is Cache;


  • Browser cache – e.g. local downloaded objects
  • CDN cache – e.g. can cache whole page static or dynamic (with ajax calls for loading customer specific data), cache static content, cache redirects, cache error pages/responses  
  • Web server cache – e.g. cache assets in disk/memory, persistent connection
  • Applications cache – e.g. cache to reduce db calls, static objects holding common properties
  • Database cache – e.g. query caching, pre-compiled queries


Any medium or large eCommerce application considers some sort of caching strategies in the architecture and it should be. But caching can’t resolve response time consumed in first hit when nothing was cached. Caching at any level needs some unique way to identify your second hit is exactly same to first one – cache key. What if there are infinite combinations and all possible combinations cannot be cached or what if you have too many session variables that make a customer session so unique that becomes impossible to cache and customer starts experiencing delay in page responses.


So apart from caching you have to work on basics and basic is data modeling. You should always model your data in a way that application should not end up doing iteration/consolidation or lot of manipulation. If Data returned from layer 1 to layer 2 is in the form that can be presented directly without iteration or with some iteration then you have done a good job. e.g. A nested for loop is always one of the biggest killer for eCommerce applications and application performance goes down and down as your data grow with time.  


Let me explain above principle with a data modeling assignment that I recently completed.


Suppose your eCommerce database stores products information into hierarchical format as shown in below –
One product that comes into three colors and each color has three size variants. It means this product information is distributed into 13 objects.


Above is very common structure for any eCommerce site especially for retailers those deal in apparel domain.


You are asked to design data model for search engine so that application should able to display color swatches on product listing page with price ranges. E.g. Color red, size M=$5 L=$7 XL=$10 XXL=$12, so your price range becomes $5-$12.


For example, Below screen shot displays swatches with price ranges on product listing page.


A search engine no matter it is Solr or Endeca always manages data into flat structure format called a Record or document. They are built to present data quickly instead of taking joins across tables at run time. That is one of the basic difference between search engine and relational databases.


In your case you have product information distributed at three levels.  


So you have following options


Option 1 – Index data at Base product level. This means a record have to store all color and sizes information as multi valued attributes.


Information distributed in 13 objects as in above diagram need to be mapped into one single record.
This become even worse when you have to store multi value attributes for image urls for corresponding color.


If you have a naming convention followed for image urls then you might not require index image urls otherwise your record will be very messy and web application fetching such records will end up splitting multi value data, identify and pick corresponding urls. Lots of lot iteration we are building in this design.


This design will become a bottleneck as your business grows and you enrich your products with more attributes at color or size level.


I will never recommend this solution.


Option 2 – Index data at color level and ask search engine to group them on basis of base product id. I mean use parent product id as rollup key (Endeca specific concept).


This way data available at 13 objects level need to be mapped into three records and if you index base product id then you can use this id as rollup key (group by).


So you will get an aggregated record list with a representative record that will be presenting one of the records values among three by default. In Endeca you have good control on representative record e.g. though this record is one of them but it can have attributes that represent lowest and highest price values among three. (Called derived attributes in Endeca). This lowest and highest price becomes your price range.
Since your records representing color so you have to keep size and price data as multivalue attributes. But on search/listing pages you generally don’t need size specific information. It is required on product detail page which you generally don’t render through search engine.


The best part of this approach is search engine gives you data in almost same format that you want to display on listing pages. So fast rendering is possible with less iteration.


Option 3 - Index data at size level and use base product id as rollup key (group by)


13 objects are now mapped into 9 size records with a common base product key.
Duplicity of records is high in this solution and web application has to identify color by consolidating all size records. This can be a performance hit because you have to iterate all size records and consolidate color and other attributes.


Generally you don’t need size specific information on listing pages, if that is the case then I will recommend option 2 is the best way of modeling this requirement.


I hope I could link my principal of ‘less iteration’ with the very basic concept of ‘data modelling’.


Thanks.
http://www.linkedin.com/in/sumitag

Sunday, July 27, 2014

Hybris Design consideration

  • Ensure Hybris app servers are not serving static content. (High priority) Use CDN for static media/pages delivery. This may reduce 90% load from your infrastructure. If above not possible then simply ensure web server layer fronting hybris serving static media.
  • Ensure hybris servers are running on physical servers instead of virtual machines. Though VMs are supported but we have tested that hybris serves best on physical. (Low Priority)
  • DB Indexing – Hybris out of the box DB indexes are poor so ensure according to your model customization, DB indexes are created/updated otherwise site goes into deadlock soon. (High priority) You would like to drop unused indexes also, so that DB should not waste it’s time. (Medium priority)
  • Perform SSL termination at load balancer/web server. Don’t trouble hybris app servers for SSL handshake. (Medium priority)
  • Leverage web server modules for browser caching and compression. Don’t forget disabling unused apache modules. (Low priority)
  • Ensure only required extensions are deployed on front end hybris servers and avoid deploying any back office extension on front end. This is good for security and performance. (Medium priority)
  • Ensure loggers are not configured in debug mode. In production it should be in Error mode only on front boxes. (Low)
  • Hybris application is very prone to deadlocks because it maintains staging and online catalog version in the same table of the database. Avoid maintaining/developing catalog version aware types. If cannot then avoid making them part of catalog sync. e.g price  - Touching price daily for every product may result into full cat sync. Either maintain prices with online version only or at the time of create/update update both version. This way you can remove price item type from catalog sync.
  • OOB Hyrbis Catalog sync process is not a great design and that is rectified in hybris 5.1. But in short term developer should consider removing all unnecessary root types from sync process. (High priority)
  • My several years of experience with hyrbis says if a site is falling down after few hours of operations then issue is not with the front end code or tomcat configuration or infrastructure. Issue is definitely with the hybris application code. So don’t waste much energy in optimizing css, js, images etc, this may help a little but can’t solve the real instability issue.  (Low priority)
  • Stock service – Ensure you are not checking stock on loading of product detail page, category listing pages or add/remove event on basket page. Do this check only on add item into basket, submission of basket page and just before taking payment (i.e. you need to reserve stock here). (High priority)
  • Ensure hybris stock service is used to check stock status and no logic is written on Stock Model directly. (High priority)
  • Ensure JALO and web app session time out are configured to same value. (High priority)
  • 4 load balanced hybris app servers should be more than sufficient for a decent load, if solution is designed correctly. Adding n number of servers in infrastructure will not save you from a site crash. Trust me you have to fix core coding issue (not using hyrbis services correctly) or DB deadlocks.
  • Set the minimum and maximum heap sizes for the JVM to the same value. 8GB should be more than sufficient. More memory means longer GC pause and GC pause means all threads on halt. So avoid giving more memory this is not going to solve problem.
  • TCP/UDP clustering configuration doesn’t matter much for 4 app server cluster. But prefer sticking with default UDP settings. Hybris works well for either case until you have some serious networking issue. Use udpsniff to validate packets (Low priority)
  • Solr – Ensure solution is designed properly. You should be really very smart here.
  • Run Solr in standalone mode with one master server. Perform delta Index frequently 10-15 mins.  Prefer two-phase mode so that your site should not stuck while indexing happening behind the scene. Run full index update once in a week or only when you perform schema change. (High Priority)
  • Use Solr as much as you can. Because this way you can save DB calls. Hybris is very chatty with DB because of it’s cache refreshment and lazy loading concept. I really hate this part of hybris but now used to live with this and identified ways to avoid implementing DB centric solution. E.g. you can use solr to render category listing pages, you can index price and stock data and use this data as much as you can. Whole objective touch DB when it is really necessary. (High priority)
  • Disable Quick Search in hmc on the front-end and back-end hybris application servers. (Low priority)
  • If hybris is not your PIM and you have some other system where merchandising team perform preview before publishing a product then you really don’t need multiple catalog version in hybris. Here you can save lots of overhead from hybris. (High priority)
  • Hyrbis customizations – you should be in position to justify the customization you are going to suggest. I have seen several implementations where customization is done while functionality was available OOB. This happens when java developers with less hybris knowledge working in architect role. I have endless list of such examples -
  • Order Invoice generation/re-generation in pdf format was the requirement and solution implemented was pure custom by using some open source api.  Developers didn’t realized that this all available hybris OOB.
  • Another example is purging/archive items those are older than 30 days is a requirement and I have seen developers written lots of java code/scheduler etc to   achieve this while this can be done without any single line of code. Hybris OOB provide you to configure purging of any item type. You can configure this manually through hmc or write an impex.
  • Key point is avoid re-inventing the wheel and tries to find the available tested wheels in the system that you already bought. (High priority)
  • Ensure your custom types defined into items xml are persisting data into its own table rather than piggy banking on generic table. (High priority)
  • A common mistake is that developers cut-and-paste type and relation definitions in items.xml files which may result in unintentionally setting relation ends to be ordered. As I said Hybris application is very chatty with DB so you should be very cautious while defining your types and relations.  Copy paste may result into make it more DB chatty while you could have avoided this. (High Priority)
  • If no deployment table is defined for a many-to-many relation, a generic join table is used to store relations and is not optimal for performance. So ensure you define rel table for your custom many to many relationships.
  • Any relation-end that does not need to be ordered should have this attribute set to false.
  • Collection types – Avoid defining collection type. To maintain data integrity, always use a relation.
  • Avoid creating history records if possible or purge them on regular interval. Creating a history record by using auditing service might be fancy but can become a big performance overhead very quickly. Imagine creating an audit record for each stock level change. (Medium Priority)
  • Cron job logs. Hybris forgot to add pagination and this kills your hmc when you open a job with thousands of log files. (Medium Priority)
  • Ensure you use hybris WCMS Navigation node design for Mega menu construction so that you should not end-up preparing nested category hierarchy with every hit. 
  • Ensure passwords of default users (customer/employee/none) are changed and made complex enough to guess.
Default users - admin, anonymous, vjdbcReportsUser, csagent, cmsmanager
Note - 1. Hybris creates these users when you perform initialize/update. It only updates if such a user does not exist in the users table. So If you  have  changed the  password once then hybris won’t override them during re-running of hybris update process. But if you delete a default user then it will be created again with default password on running the hybris update with corresponding extension selected.
2. If you are dropping a new OOB extension for some new  functionality then it is worth ensuring that the new extension does not create a user record,  if it does then it is your responsibility to change the password in live environment at least.
 

I will share more experience as n when I will get time but post your comments if you need more details in any specific area. Feel free to ask any question related to Hybris, Solr, Endeca or webMethods. For my learning, I am after questions that I can't answer.
Thanks.


 

Tuesday, June 25, 2013

Hybris Performance


For best performance results on your Hybris eCommerce system below are few key things -

Ensure following - 

Your cluster broad casting thread max wait is updated from default 60000 to 60 seconds
cluster.broadcast.senderthreads.maxwait=60

Prefer UDP as a broadcasting method instead of TCP.

Keep tomcat connector maxThread 400, going beyond this it adds less value but more overhead.

Allocating bigger cache sizes does not help and some time ends up eating lot of memory, so ensure cache sizes are determined after some test and analysis.

Keep back office servers outside from customer facing hybris nodes and distribute your cron jobs on multiple back office nodes.

Lucene Indexing - I am sure you must be using solr or some other search engine for customer facing search requirements. If yes then having lucene indexes just for hmc search does not make any sense after acknowledging the overhead it adds on the system. I suggest disable all rebuild and update lucene jobs forever.

Physical vs VM machines - Hybris instances perform far better on a Physical machine. if possible never opt VM machines for your hybris cluster.

Cluster size - Be careful here, keep adding node in existing cluster might not be a right approach.

A reasonable cluster configuration is-
4 hybris front instances  - 400 http + 200 https connector threads
2 hybris back office  - 200 http + 100 https connector threads
Above configuration is good enough o handle ~6000 concurrent hits on your site where concurrency level is 1 sec.

1 hit  = 1 connector thread
You must be thinking 6000 concurrent hits means = 6000 threads but we have not configured that many then how to handle

will all threads stay for 1 sec (our concurrency level)? No, some will get response in micro, some in mills but some might take few seconds.
If any thread staying in your system for more then 3-4 seconds then a serious code review required.

Page views in a hour, day, month etc. - nothing matters. You should prepare yourself at lowest possible concurrency level. if your system could handle load for a given time 't' then application will last for longer.

Media Serving - Ensure your hybris instance not serving any static content. Application server do something more intelligent.   You can simply configure your front web server/load balancer  to serve media directly. Review your hybris access log and ensure no static content reached upto them.

CDN - if your organization can afford this then definitely go for this. It can give offload even beyond 90%. Less infrastructure cost, less overhead and moreover not only static you can also cache dynamic pages, if coded smartly.

continued......

Wednesday, August 1, 2012

why Hybris?


Why retailers should opt Hybris platform as their future eCommerce platform?

Is it Scalable?
                Hybris is spring jee based platform and it is as scalable as any other java based platform.
                Adding new cluster node is just a matter of starting a new hybris instance with 3 property changes.
                Hybris will automatically add that node in cluster. It has distributed cache per server and it's out of             
                the box cache invalidation over UDP is very much efficient.

Will it perform if we have over 1 million SKUs?
                Yes, if you have used apis, flexi search in a right way. if you have used pagination in your flexi search and converter caches than it will not matter how much SKUs you have.

Is it configurable or require deployment for all sort of changes?
                Mostly configuration until you has not done your own customizations. Even you can expose your custom elements to hmc and from there you can also change your configuration on the fly.

How easy to learn this?
                Actually if you know spring and has concept of basic ecommerce than not much effort required to learn hybris platform. You need to be familiar with impex and flexi search syntax but plenty of examples are available to learn this.

Do they have onDemand flavour also?
                Yes, recently introduced as an offering.

Can we use hybris for product enrichment also?
                Yes, their PIM module is meant for this.

How you will compare this with ATG or WCS?
                Very soon hybris will be on top of these because of it's simplicity.

Can we use this as a fully fledged content management solution?
                No, if your site is very much specific to content management and you want to manage specific version of individual content than wcms is not the right solution for you.
                Otherwise it has all features that are required for an ecommerce application.

How flexible promotion module is?
                Very much. Lot many promotions are available out of the box and you can extend their 
                 framework very easily to introduce your custom promotion.

Please talk about search capabilities?
                Google commerce search over cloud. What else I need to say. If GCS does not suits to you than 
                 use SOLR as a free solution. It is available OOB.

What integration topologies it provides?
                Whatever you want....pre-integrated ActiveMQ on tomcat is available for messaging. restful  
                webservices framework available OOB.

Does it have business processing capabilities also?
                Yes, limited but you can define your workflows. Moreover you can modify your workflow 
                decisions on the fly through hmc.

How secure it is?
                Spring security is integrated OOB.

How easy it is to migrate from existing Magento or custom ecommerce solution to hybris platform?
                It will depend but highly feasible. Their data import process is very much flexible. Once data mapping is done it is easy to migrate.

How frequent the do their releases?
                They have pre-defined roadmap. please refer wiki.hybris.com

What is the difference between PIM and PCM?
                Consider them synonyms.

What training paths are available to learn hybris?
                Refer wiki.hybris.com

Is it true multi-site, multi-store, multi lingual platform?
                Yes. OOB - Out of the box.

Thursday, October 11, 2007

Design Pattern – “An Object Oriented Designer thinking kit”

Life without it - like constructing your dream house without any layout and
realizing the problems in middle or after construction.

Life with it – now you have a layout on which you worked a lot before actual
construction, e.g. proper spaces for ventilation, future expendation, vastu
compliant, parking, entertaining guests, security systems and many more.

The whole idea behind design patterns is to develop a standardized way to represent general solutions to commonly encountered problems in software development.

Direct Benefits –
Biggest Benefit - Easily expandable and reusable structure.
Minimize testing Cost - Changes lead less overhead for re testing.
OOPs – All benefits of being object oriented

Indirect Benefits
Effective way to share experience, over time, we can build up catalogs of patterns. This enables
newcomers to software development to more effectively benefit from experience gained over the years.
There is formal documentation about the tradeoffs involved in software design decisions; about the pluses and minuses of development choices. Standardizing patterns makes it easier for all development professionals—beginners and experts alike—to explicitly understand the implications of their decisions.
The design patterns provide a common vocabulary. This makes communicating decisions to developers easier. Rather than describing a design in detail, we can use a pattern name to explain our plans.

Pattern Best Contributor - “Gang of Four” or GoF
1. Erich Gamma
2. Richard Helm
3. Ralph Johnson
4. John Vlissides

  • Design Patterns
    Creational
    Abstract Factory
    Builder
    Factory Method
    Prototype
    Singleton
    <>
    Behavioral
    Observer
    Chain of Responsibility
    Command
    Interpreter
    Iterator
    Mediator
    Memento
    State
    Strategy
    Visitor
    Template Method
    <>
    Structural
    Adapter
    Bridge
    Composite
    Decorator
    Facade
    Flyweight
    Proxy
    <>
    System
    MVC
    Session
    Worker Thread
    Callback
    Successive Update
    Transaction
    <>

if you feel you design a problem with your own way and that way can be useful for others then blog and patent your "Design Pattern".

sumit

Puzzle for brain exercise

five pirates have 100 gold coins. they have to divide up the loot. in order of seniority (suppose pirate 5 is most senior, pirate 1 is least senior), the most senior pirate proposes a distribution of the loot. they vote and if at least 50% accept the proposal, the loot is divided as proposed. otherwise the most senior pirate is executed, and they start over again with the next senior pirate. what solution does the most senior pirate propose? assume they are very intelligent and extremely greedy (and that they would prefer not to die).